Search results for "topic model"
showing 10 items of 23 documents
Multi-label Classification Using Stacked Hierarchical Dirichlet Processes with Reduced Sampling Complexity
2018
Nonparametric topic models based on hierarchical Dirichlet processes (HDPs) allow for the number of topics to be automatically discovered from the data. The computational complexity of standard Gibbs sampling techniques for model training is linear in the number of topics. Recently, it was reduced to be linear in the number of topics per word using a technique called alias sampling combined with Metropolis Hastings (MH) sampling. We propose a different proposal distribution for the MH step based on the observation that distributions on the upper hierarchy level change slower than the document-specific distributions at the lower level. This reduces the sampling complexity, making it linear i…
2021
Staying at the front line in learning research is challenging because many fields are rapidly developing. One such field is research on the temporal aspects of computer-supported collaborative learning (CSCL). To obtain an overview of these fields, systematic literature reviews can capture patterns of existing research. However, conducting systematic literature reviews is time-consuming and do not reveal future developments in the field. This study proposes a machine learning method based on topic modelling that takes articles from a systematic literature review on the temporal aspects of CSCL (49 original articles published before 2019) as a starting point to describe the most recent deve…
Supervised vs Unsupervised Latent DirichletAllocation: topic detection in lyrics.
2020
Topic modeling is a type of statistical modeling for discovering the abstract ``topics'' that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a fixed number of topics starting from words in each document modeled according to a Dirichlet distribution. In this work we are going to apply LDA to a set of songs from four famous Italian songwriters and split them into topics. This work studies the use of themes in lyrics using statistical analysis to detect topics. Aim of the work is to underline the main limits of the standard unsupervised LDA and to propose a supervised…
Statistically Validated Networks for assessing topic quality in LDA models
2022
Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution overwords characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be only characterized by a set of irrelevant or unchained words, being useless for the interpretation. Although many topic-quality metrics were proposed (Newman et al., 2009; Alet…
Exploring topics in LDA models through Statistically Validated Networks: directed and undirected approaches
2022
Probabilistic topic models are machine learning tools for processing and understanding large text document collections. Among the different models in the literature, Latent Dirichlet Allocation (LDA) has turned out to be the benchmark of the topic modelling community. The key idea is to represent text documents as random mixtures over latent semantic structures called topics. Each topic follows a multinomial distribution over the vocabulary words. In order to understand the result of a topic model, researchers usually select the top-n (essential words) words with the highest probability given a topic and look for meaningful and interpretable semantic themes. This work proposes a new method …
MEASURING TOPIC COHERENCE THROUGH STATISTICALLY VALIDATED NETWORKS
2020
Topic models arise from the need of understanding and exploring large text document collections and predicting their underlying structure. Latent Dirichlet Allocation (LDA) (Blei et al., 2003) has quickly become one of the most popular text modelling techniques. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models give no guaranty on the interpretability of their outputs. The topics learned from texts may be characterized by a set of irrelevant or unchained words. Therefore, topic models require validation of the coherence of estimated topics. However, the automatic evaluation …
Ranking coherence in topic models using statistically validated networks
2023
Probabilistic topic models have become one of the most widespread machine learning techniques in textual analysis. Topic discovering is an unsupervised process that does not guarantee the interpretability of its output. Hence, the automatic evaluation of topic coherence has attracted the interest of many researchers over the last decade, and it is an open research area. This article offers a new quality evaluation method based on statistically validated networks (SVNs). The proposed probabilistic approach consists of representing each topic as a weighted network of its most probable words. The presence of a link between each pair of words is assessed by statistically validating their co-oc…
Establishing Video Game Genres Using Data-Driven Modeling and Product Databases
2015
Establishing genres is the first step toward analyzing games and how the genre landscape evolves over the years. We use data-driven modeling that distils genres from textual descriptions of a large collection of games. We analyze the evolution of game genres from 1979 till 2010. Our results indicate that until 1990, there have been many genres competing for dominance, but thereafter sport-racing, strategy, and action have become the most prevalent genres. Moreover, we find that games vary to a great extent as to whether they belong mostly to one genre or to a combination of several genres. We also compare the results of our data-driven model with two product databases, Metacritic and Mobyga…
Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models
2017
Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while …
Social Collaborative Viewpoint Regression with Explainable Recommendations
2017
A recommendation is called explainable if it not only predicts a numerical rating for an item, but also generates explanations for users' preferences. Most existing methods for explainable recommendation apply topic models to analyze user reviews to provide descriptions along with the recommendations they produce. So far, such methods have neglected user opinions and influences from social relations as a source of information for recommendations, even though these are known to improve the rating prediction. In this paper, we propose a latent variable model, called social collaborative viewpoint regression (sCVR), for predicting item ratings based on user opinions and social relations. To th…